Tagging with Combined Language Models and Large Tagsets
نویسنده
چکیده
The paper discusses experiments, results, applications and further developments in tagging a highly inflectional language, based on multiple register diversified language models. The texts are accurately disambiguated in terms of a large tagset (611 tags) in two linear-time processing steps (tiered processing). The underlying tagger simultaneously uses multiple register language models and choosing the final annotation is achieved by a combined classifiers decisionmaking procedure.
منابع مشابه
The paper describes a general method (as well as its implementation and evaluation) for deriving the mapping rules for the dif
The paper describes a general method (as well as its implementation and evaluation) for deriving mapping models for different tagsets available in existing training corpora (gold standards) for a specific language. These mapping models are further used to significantly improve the accuracy in the underlying training corpora and also for the assessment of the distributional adequacy of various t...
متن کاملHigh Accuracy Tagging with Large Tagsets
The paper presents experiments and results related to morpho-syntactic (MS) tagging of a highly inflectional language, based on combining language models (LM) learnt from multiple register-diversified corpora. To cope with a large tagset (614 tags), our underlying tagger uses a hidden smaller tagset (92 tags), mapped back, after the proper tagging, into the initial tagset. The same text is tagg...
متن کاملUsing a Large Set of EAGLES-compliant Morpho-Syntactic Descriptors as a Tagset for Probabilistic Tagging
The paper presents one way of reconciling data sparseness with the requirement of high accuracy tagging in terms of fine-grained tagsets. For lexicon encoding, EAGLES elaborated a set of recommendations aimed at covering multilingual requirements and therefore resulted in a large number of features and possible values. Such an encoding, used for tagging purposes, would lead to very large tagset...
متن کاملTagset Mapping and Statistical Training Data Cleaning-up
The paper describes a general method (as well as its implementation and evaluation) for deriving mapping systems for different tagsets available in existing training corpora (gold standards) for a specific language. For each pair of corpora (tagged with different tagsets), one such mapping system is derived. This mapping system is then used to improve the tagging of each of the two corpora with...
متن کاملTiered Tagging Revisited
In this paper we describe a new baseline tagset induction algorithm, which unlike the one described in previous work is fully automatic and produces tagsets with better performance than before. The algorithm is an information lossless transformation of the MULTEXTEAST compliant lexical tags into a reduced tagset that can be mapped back on the lexicon tagset fully deterministic. From the baselin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008